This data analysis intends to provide more insight into the 26 Michelin star restaurants currently active in the city of San Francisco.
Linked here is the written file with more context and details into the process of this analysis: Written File
To start off, load in the Registered Businesses csv and the Michelin csv.
The Registered Businesses dataset is a huge table with every single registered business that pays taxes/is licensed in San Francisco and a few neighboring cities. This includes restaurants, retail stores, warehouses, and more. Let’s work on cleaning up this data set and isolating the elements that we will actually use.
## # A tibble: 256,968 × 8
## name address city state zipcode business_start_date neighborhood
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 L. Steiger 328 Ha… San … CA 94102-… 07/24/2009 Hayes Valley
## 2 Executive Merce… 1759 G… San … CA 94123-… 03/25/2000 Marina
## 3 Benson & Neff-C… 1 Post… San … CA 94104 10/01/1968 Financial D…
## 4 Bad Intentions 328 Ha… San … CA 94102-… 07/24/2009 Hayes Valley
## 5 Tribu Partners … 1650 M… San … CA 94111 06/01/2018 Financial D…
## 6 Haas Brothers 2017 L… San … CA 94115-… 10/01/1968 Presidio He…
## 7 Rdi Research Da… 60 Gre… San … CA 94111 06/15/2018 Financial D…
## 8 Union B A City … 2127 U… San … CA 94123-… 09/06/2006 Marina
## 9 Rdi Research Da… 60 Gre… San … CA 94111 06/15/2018 Financial D…
## 10 Rdi Research Da… 60 Gre… San … CA 94111 06/22/2018 Financial D…
## # ℹ 256,958 more rows
## # ℹ 1 more variable: business_location <chr>
From the Registered Business dataset, we have access to the business start dates for each business. This could be an interesting piece of analysis. Let’s look over the data from the past 10 years and see if there is a trend. This could give us information about the state of investing and entrepreneurship in SF since 2012.
## tibble [256,968 × 2] (S3: tbl_df/tbl/data.frame)
## $ name : chr [1:256968] "L. Steiger" "Executive Mercedes Sedan Srvc" "Benson & Neff-Cpas-A Prof Corp" "Bad Intentions" ...
## $ business_start_date: Date[1:256968], format: "2009-07-24" "2000-03-25" ...
## # A tibble: 143,092 × 3
## name business_start_date year
## <chr> <date> <dbl>
## 1 Tribu Partners Llp 2018-06-01 2018
## 2 Rdi Research Data Insights 2018-06-15 2018
## 3 Rdi Research Data Insights 2018-06-15 2018
## 4 Rdi Research Data Insights 2018-06-22 2018
## 5 Arias Desserts 2018-09-01 2018
## 6 Boniva Nail Spa 2013-12-20 2013
## 7 Classpass 2015-05-01 2015
## 8 Abs Global Trading Limited 2017-10-01 2017
## 9 Creative Bug Preschool 2018-08-22 2018
## 10 Openanesthesia Llc 2014-09-19 2014
## # ℹ 143,082 more rows
It looks like there is a decreasing trend for businesses being registered since 2016. There is a large dip in 2020, which could be due to COVID, and since then, we have been at a steady rate. 2023 data is still being collected, so the drop off at the end is not significant right now.
Now that we can look at our dataset more clearly since it has been cleaned, let’s explore the structure of San Francisco a bit more. The city of San Francisco is organized into neighborhoods. How many are there where businesses are being registered?
## # A tibble: 42 × 2
## # Groups: neighborhood [42]
## neighborhood n
## <chr> <int>
## 1 Financial District/South Beach 40323
## 2 Mission 20430
## 3 South of Market 16438
## 4 Sunset/Parkside 12314
## 5 Bayview Hunters Point 12065
## 6 Marina 8952
## 7 Outer Richmond 8570
## 8 Chinatown 8500
## 9 Tenderloin 7961
## 10 Castro/Upper Market 7674
## # ℹ 32 more rows
Looks like the Financial District/South Beach neighborhood has the most amount of active registered businesses.
There are 41 neighborhoods (the last row is used for extra data) where businesses are being registered. San Francisco has 41 neighborhoods. This means, as one may expect, that there are active businesses present in every neighborhood in San Francisco.
There is a dataset on the San Francisco city data portal that contains the name of each neighborhood and a geom for each, which represents a polygon. If we join these tables together, we can use leaflet to plot these polygons.
## # A tibble: 41 × 3
## nhood the_geom n
## <chr> <chr> <int>
## 1 Western Addition MULTIPOLYGON (((-122.42144200043835 37.785567000052… 4828
## 2 West of Twin Peaks MULTIPOLYGON (((-122.46104000042365 37.750957999568… 7317
## 3 Visitacion Valley MULTIPOLYGON (((-122.40385399997592 37.718829999966… 1663
## 4 Twin Peaks MULTIPOLYGON (((-122.44694999987867 37.756549999888… 1168
## 5 South of Market MULTIPOLYGON (((-122.40371199999187 37.784043999707… 16438
## 6 Treasure Island MULTIPOLYGON (((-122.3635827833368 37.8208705562424… 820
## 7 Presidio Heights MULTIPOLYGON (((-122.44629999989719 37.791878999765… 3622
## 8 Presidio MULTIPOLYGON (((-122.44812880222945 37.806891619781… 1187
## 9 Potrero Hill MULTIPOLYGON (((-122.38487045956524 37.767240007151… 6254
## 10 Portola MULTIPOLYGON (((-122.40465699960654 37.732949000443… 1926
## # ℹ 31 more rows
First, join the Michelin_2023 table and the registered_businesses table. By doing this, we can only keep the businesses in the large table that are Michelin star restaurants. We also have to make sure to remove duplicates.
## # A tibble: 26 × 12
## name stars cuisine price `green star` address city state zipcode
## <chr> <dbl> <chr> <dbl> <lgl> <chr> <chr> <chr> <chr>
## 1 Aphotic 1 Seafood 4 TRUE 816 Fo… San … CA 94107-…
## 2 Atelier Crenn 3 Contem… 4 TRUE 3127 F… San … CA 94123-…
## 3 Sons & Daughters 1 Contem… 4 FALSE 708 Bu… San … CA 94108
## 4 The Progress 1 Califo… 3 FALSE 1525 F… San … CA 94115-…
## 5 The Shota 1 Japane… 4 FALSE 115 Sa… San … CA 94104
## 6 Restaurant Nisei 1 Japane… 4 FALSE 755 Bu… San … CA 94131
## 7 Quince Restaura… 3 Contem… 4 TRUE 470 Pa… San … CA 94133-…
## 8 O' By Claude Le… 1 French 4 FALSE 165 O'… San … CA 94102
## 9 Ssal 1 Korean 4 FALSE 2226 P… San … CA 94109
## 10 Acquerello 2 Italian 4 FALSE 1722 S… San … CA 94109-…
## # ℹ 16 more rows
## # ℹ 3 more variables: business_start_date <chr>, neighborhood <chr>,
## # business_location <chr>
To make the data plottable, we can use a loop and an API key to get the longitude and latitude of each address. This was made possible with a bit of help from a few Google searches.
## # A tibble: 26 × 15
## name stars cuisine price `green star` address city state zipcode
## <chr> <dbl> <chr> <dbl> <lgl> <chr> <chr> <chr> <chr>
## 1 Aphotic 1 Seafood 4 TRUE 816 Fo… San … CA 94107-…
## 2 Atelier Crenn 3 Contem… 4 TRUE 3127 F… San … CA 94123-…
## 3 Sons & Daughters 1 Contem… 4 FALSE 708 Bu… San … CA 94108
## 4 The Progress 1 Califo… 3 FALSE 1525 F… San … CA 94115-…
## 5 The Shota 1 Japane… 4 FALSE 115 Sa… San … CA 94104
## 6 Restaurant Nisei 1 Japane… 4 FALSE 755 Bu… San … CA 94131
## 7 Quince Restaura… 3 Contem… 4 TRUE 470 Pa… San … CA 94133-…
## 8 O' By Claude Le… 1 French 4 FALSE 165 O'… San … CA 94102
## 9 Ssal 1 Korean 4 FALSE 2226 P… San … CA 94109
## 10 Acquerello 2 Italian 4 FALSE 1722 S… San … CA 94109-…
## # ℹ 16 more rows
## # ℹ 6 more variables: business_start_date <chr>, neighborhood <chr>,
## # business_location <chr>, full_address <chr>, latitude <dbl>,
## # longitude <dbl>
Some neighborhoods look like they have a higher quantity of Michelin star restaurants, while others have none at all. Let’s find out how many each one has.
## # A tibble: 14 × 2
## neighborhood count
## <chr> <int>
## 1 Financial District/South Beach 5
## 2 Mission 3
## 3 South of Market 3
## 4 Marina 2
## 5 Nob Hill 2
## 6 Russian Hill 2
## 7 Western Addition 2
## 8 Inner Richmond 1
## 9 Japantown 1
## 10 Mission Bay 1
## 11 North Beach 1
## 12 Presidio Heights 1
## 13 Tenderloin 1
## 14 Twin Peaks 1
It looks like only 14 of the 41 neighborhoods contain Michelin star restaurants while the others have none.
The Financial District/South Beach neighborhood has the most Michelin Star Restaurants - 5 of them! This is consistent with the fact that they have the most number of active businesses in general, which we found earlier.
It would also be interesting to see what the most popular cuisine for a Michelin Star restaurant in San Francisco is.
## # A tibble: 12 × 2
## # Groups: cuisine [12]
## cuisine n
## <chr> <int>
## 1 Contemporary 10
## 2 Californian 3
## 3 Japanese 2
## 4 Korean 2
## 5 Thai 2
## 6 Asian 1
## 7 Chinese 1
## 8 French 1
## 9 Italian 1
## 10 Mexican 1
## 11 Seafood 1
## 12 Steakhouse 1
We can group Japanese, Korean, Chinese, and Thai cuisines together under the broader category of Asian food.
## # A tibble: 8 × 2
## # Groups: grouped_cuisine [8]
## grouped_cuisine n
## <chr> <int>
## 1 Contemporary 10
## 2 Asian 8
## 3 Californian 3
## 4 French 1
## 5 Italian 1
## 6 Mexican 1
## 7 Seafood 1
## 8 Steakhouse 1
The most popular food is contemporary. The second most popular cuisine is Asian.
Food is a great reflection of the people that a place represents. Maybe the most popular cuisines - Contemporary and Asian - reflect the demographics of people in San Francisco. This could be because of higher demand for these types of foods. Lets use census data to try and visualize this.
## [1] "d1f1df5b9948fc21332b7b9908e025c042751e7d"
## # A tibble: 6 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 06075 San Francisco County, California total 865933 NA
## 2 06075 San Francisco County, California white 376056 2406
## 3 06075 San Francisco County, California black 45135 1290
## 4 06075 San Francisco County, California american_indian_alaska_… 4212 762
## 5 06075 San Francisco County, California asian 297680 2011
## 6 06075 San Francisco County, California hawaiian_pacific_island… 3111 478
## # A tibble: 7 × 6
## GEOID NAME variable estimate moe percentage
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 06075 San Francisco County, California white 376056 2406 0.434
## 2 06075 San Francisco County, California black 45135 1290 0.0521
## 3 06075 San Francisco County, California american_ind… 4212 762 0.00486
## 4 06075 San Francisco County, California asian 297680 2011 0.344
## 5 06075 San Francisco County, California hawaiian_pac… 3111 478 0.00359
## 6 06075 San Francisco County, California other 67137 2785 0.0775
## 7 06075 San Francisco County, California two_or_more 72602 3060 0.0838
## [1] 1
There seems to be a large Asian population, with Asians making up 34% of the population - the second most populous group. This may be correlated with the number of Michelin star restaurants in San Francisco that serve Asian food.
Food is something that can be very unifying, and the Michelin star system represents distinction for those restaurants that do an especially exquisite job at delivering an unforgettable dining experience. As we head into the new year and wait for 2024’s Michelin guide release, let’s be sure to share a few bites to eat with those that we love.